Statistics 651
Introduction

Winter 2026

Objectives

  1. Understand the Bayesian paradigm and its value in statistics
  2. Use Bayes’ theorem to update probabilistic beliefs
  3. Quantitatively incorporate uncertainty into a decision-making process

Resources

  • Introduction to Monte Carlo integration
  • BDA3:
    • Ch 1.1-1.5,
    • Ch 1.10
    • Ch 1.7-1.8 (optional examples)
  • BIDA: Ch 2.2

Bayesian paradigm

Bayesian paradigm

Two steps

  1. Build a joint probability model for known quantities (data) and unknown quantities (parameters).
  2. Use rules of conditional probability!

The antidote to a black box.

A note on notation

We will use a unified integration notation so that if random variable \(\theta \in \Theta\) has cumulative distribution function \(F\), then for some measurable set \(A \subseteq \Theta\),

\[ \mathbb{P}_F(A) = \int_A F(\mbox{d}\theta) \] which equals \[ \mathbb{P}_F(A) = \int_A f(\theta) \, \mbox{d}\theta \]

if \(\theta\) is continuous with density \(f\), and

\[ \mathbb{P}_F(A) = \sum_{\theta \in A} f(\theta) \]

if \(\theta\) is discrete with mass function \(f\).

A note on notation

With this notation, expectations can be written as follows. For some function \(h(\theta)\),

\[ \mathbb{E}[h(\theta)] = \int_\Theta h(\theta) \, F(\mbox{d}\theta) \] which equals \[ \mathbb{E}[h(\theta)] = \int_\Theta h(\theta) \, f(\theta) \, \mbox{d}\theta \]

if \(\theta\) is continuous with density \(f\), and

\[ \mathbb{E}[h(\theta)] = \sum_{\theta \in \Theta} h(\theta) \, f(\theta) \]

if \(\theta\) is discrete with mass function \(f\).

Bayes’ theorem

Theorem

Let \(\{ \mathcal{S}, \mathcal{B}(\mathcal{S}), \mathbb{P} \}\) be a probability space.

Let \(\{ A_i : i = 1, 2, \ldots \}\) with \(A_i \in \mathcal{B}(\mathcal{S})\) be a partition of \(\mathcal{S}\), and let \(B \in \mathcal{B}(\mathcal{S})\) with \(\mathbb{P}(B) > 0\).

Then,

\[ \mathbb{P}(A_j \mid B) = \frac{\mathbb{P}(B \mid A_j) \, \mathbb{P}(A_j)}{ \sum_i \mathbb{P}(B \mid A_i) \, \mathbb{P}(A_i)} \, .\]

Sequential analysis

Bayes’ theorem can be used to coherently incorporate evidence to update beliefs.

Two paths to the same conclusion:

Sequential analysis

Example


I have two coins

  • one has a heads side and a tails side,
  • the other has heads on both sides.

I randomly select one of the two coins and flip it. Heads. Which coin was selected?

I flip the same coin again. Heads. Which coin was selected?

Flip it again. Heads. Which coin was selected?




Sequential analysis

Two-headed coin example

Before first flip:

After first flip:



After second flip:



After third flip:



Bayes’ theorem (modeling version)

Theorem

Let \(f(x \mid \theta)\) denote the joint density (or mass) function of “data” \(x\), conditional on “parameter” \(\theta \in \Theta\) (discrete or continuous) having “prior” distribution \(\Pi\).

Then for any \(A \in \mathcal{B}(\Theta)\),

\[ \mathbb{P}(\theta \in A \mid x) = \frac{ \int_A f(x \mid \theta) \, \Pi(\mbox{d}\theta)}{ \int_\Theta f(x \mid \theta) \, \Pi(\mbox{d}\theta)} \, \] defines a posterior distribution of \(\theta\).

Note that \(f(x \mid \theta)\) and \(\Pi\) together define a joint distribution over \(x\) and \(\theta\).

Sequential analysis

Example

Bayes’ original thought experiment: An Essay Towards Solving a Problem in the Doctrine of Chances, published in 1763.

  1. Throw a white ball and (secretly) record its random horizontal placement.
  2. Repeatedly throw other balls and report ONLY whether they ended up to the left or to the right of the original white ball.
  3. Infer the horizontal placement of the white ball.

Sequential analysis

Example (Bayes’ experiment)

Sequential analysis

Example (Bayes’ experiment)

Sequential analysis

Example (Bayes’ experiment)

Sequential analysis

Example (Bayes’ experiment)

Sequential analysis

Example (Bayes’ experiment)

Why Bayes?

Uncertainty

Propagation of uncertainty

  • Unified methods for any model (no need to transform to normal to access methodology)
  • Inference for any quantities of interest
  • Inference incorporating model uncertainty

Simultaneous estimation

When using a single model, the posterior distribution contains all necessary information for inferences on the unobserved variables.

This eliminates the need to estimate components in sequence.

Latent variables

Decision analysis

Decision analysis

Examples

Loss functions

Definition

Let \(\theta\) be an unknown state from a space of possible states \(\Theta\),
and let \(a\) be an available action from set \(A\).

The function \(L(\theta, a) \in \mathbb{R}\) with \(-\infty < L(\theta, a)\) is called a loss function.

Equivalently, we can call \(U(\theta, a) := -L(\theta, a)\) a utility function.

Loss functions

Example

Suppose you are on a first date, and you are interested in continuing to date this person.

The “state” is this person’s reciprocal interest, which is unknown to you.
Let \(\theta \in [0,1]\) with \(0\) being total disinterest.

The “actions” available to you are
\(a_1=\) Give up and move on,
\(a_2=\) Let your date make the next move, and
\(a_3=\) Invite them on a second date.

Create a two-way table defining a valid loss (or utility) function for each state (discretized or step function) and action pair.

Expected loss

Definition

Given a loss function \(L(\theta, a) \in \mathbb{R}\) and probability distribution \(\Pi\) on \(\theta\), define the expected loss for any action as

\[\mathbb{E}_\Pi[L(\theta, a)] := \int_\Theta L(\theta, a) \, \Pi(\mbox{d}\theta) \, .\]

Expected loss

Dating example

Suppose our belief about \(\theta\) can be expressed with a beta distribution.

Calculate \(\mathbb{E}_\Pi[L(\theta, a)]\) under each possible action.










Making decisions

Definition

A decision rule maps available information in data \(x\) to an action.

Frequentist risk averages the loss with respect to the distribution of \(x\) instead of \(\theta\).

Bayes risk averages over both.

Decisions can then be made by choosing the action or rule that:

  • minimizes prior \(\Pi(\theta)\) or posterior \(\Pi(\theta \mid x)\) expected loss
  • minimizes “risk” or “Bayes risk”
  • follows the “minimax” principle

See Statistical Decision Theory and Bayesian Analysis by James Berger.